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METHOD OF EVALUATING A RECURSIVE QUERY OF A DATABASE 

CROSS-REFERENCE TO RELATED APPLICATION 

This patent application is a continuation of patent application serial number 
07/487,346, filed March 1, 1990, now abandoned, which in turn is a continuation-in-part 
of copending patent application serial number 07/286,425, filed December 19, 1988, and 
assigned to the same assignee as the present application. 

BACKGROUND OF THE INVENTION 

The present invention relates generally to database systems and more particularly / 
to a method of evaluating a recursive query of a database. / 

Database systems are being used to store and manage more and more different 
kinds of data. As the use of database systems has expanded and the quantities of data 
stored in a database have increased, much effort has been devoted to improving existing 
database systems and developing new systems with new and better capabilities. 

Relational Database Concepts 

Data in a relational database are perceived by the user as being arranged in tables. 
Each table may be thought of as specifying a "relation" among the data in that table; 
therefore, each table is referred to as a "relation". Each row in a relation may be thought 
of as one data record. The rows are referred to as "tuples". 

To "query" a database means to request information from it. To "evaluate" a 
query means to obtain the requested information. Sometimes the requested information 
can be obtained directly by looking it up in one of the relations. If the requested 
information does not appear in any of the relations, it must be derived, for example by 
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comparing two or more tuples in a relation or by combining tuples from two different 
relations. The following four examples will help to illustrate these concepts. 



EXAMPLE 1 



Consider an historical database that contains data about people. Some of the data 
arc arranged in a table or "relation" called PARENT. The data in the PARENT relation 
are arranged in the form 

PARENT (NAME, PNAME) 

where PNAME is the name of a parent of a person named name. For example, "PARENT 
(Andrew, William)" represents a tuple which says that William is a parent of Andrew. 

Others of the data indicate who is a friend of whom; these data are arranged in a 
FRIEND relation in the form 

FRIEND (NAME, FNAME) 

where fname is the name of a friend of a person named name. 

Still others of the data are in a PERSON relation and are arranged in the form 

PERSON (name, sex) 
where name is the name of a person and sex is the sex of that person. Representative 
PARENT, FRIEND and PERSON relations are depicted in Tables I through III, 
respectively. 



PARENT relation 



NAME 

(Name of Person) 



| PNAME 

! (Name of Parent) 



Mary 
Mary 
John 
John 



Andrew 
Andrew 



| William 
| Mary 
I John 
| Anne 



| Richard 
I Wilma 



TABLE I 
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FRIEND relation 



| NAME 



| FNAME 



| Andrew 
j Mary 
! William 



| Michael 
| Joan 
| Robert 



TABLED 



PERSON relation 



| NAME 



SEX 



| Andrew 
| William 
| Mary 



M 
M 
F 



TABLE m 



A query of the form "FIND PARENTS OF [X] tt is an example of a request for 
information that is directly obtainable by looking up data in this particular database. The 
response to such a query would be the names of the parents of X. 

A query of the form "FIND MOTHER OF [X] M is an example of a request for 
information that cannot be obtained by looking it up. This is because the PARENT 
relation does not include the sex of the parents and, unlike a human, a computer does not 
know, for example, that "William" would ordinarily be a father and "Mary" a mother. 
Therefore, the requested information must be derived from information in the database, 
for example by (1) finding the parents of X in the PARENT relation, (2) finding the sex 
of each of those parents in the PERSON relation, and (3) selecting the female parent. 
The response to such a query would be the name of the mother of X. 



Consider a factory database for keeping track of parts that are used to make 
engines. The data are arranged in two relations. The first relation, SUPPLIER, contains 



EXAMPLE 2 
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data arranged in the form 

SUPPLIER (PART, SUPPLIER, CITY) 

where part represents the name (or part number or other convenient identifier) of a 
particular part, supplier represents a name of a source of that part, and city represents 
the location of that supplier. A few typical entries might be 

SUPPLIER (Needle Valve, Bill's Brass Works, Buffalo) 
SUPPLIER (Needle Valve, Paul's Plumbing Mfg. Co., Pittsburgh) 
SUPPLIER (1/4* Screw, Mac's Machine Shop, Milwaukee) 
SUPPLIER (Short Spring, Sam's Spring Specialties, Springfield) 

and so on. The second relation, SUBPART, contains data in the form 

SUBPART (SPART, SSUBPART, SQTY) 

where spart represents the name of a part that requires a subpart, ssubpart represents 
the name of that subpart, and sqty represents how many of that subpart are used in that 
part. Some examples are 

SUBPART (Carburetor, Valve Assembly, 2) 
SUBPART (Carburetor, 1/4" Screw, 16) 
SUBPART (Carburetor, Short Spring, 3) 
SUBPART (Valve Assembly, Needle Valve, l) 
SUBPART (Valve Assembly, 1/4" Screw, 4) 

indicating that each carburetor requires two valve assemblies, sixteen 1/4" screws, three 
short springs, and so on. 

A query of the form "FIND SUPPLIERS OF fX]" is an example of a request for 
information that is directly obtainable by retrieving data from this particular database. 
The response to such a query would be the names of all suppliers of part X. 

A query of the form "HOW MANY SUPPLIERS SUPPLY [XT is an example of 
a request for information that is not directly obtainable but that can be derived from 
information in the database, for example by finding all suppliers of X and then counting 
how many supplier names are found. The response to such a query would be the number 
of suppliers that supply part X. 
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EXAMPLE 3 

Consider an airline reservation system. Data in this system are arranged in a 
FLIGHT relation in the form 

FLIGHT (FROM, TO, DISTANCE, DTIME, ATIME, COST) 

where FROM is the name of a departure city, to is the name of an arrival city, DISTANCE 
is the mileage between those two cities, DTIME is the time of departure, ATIME is the 
time of arrival, and cost is the cost of a ticket on that flight. TABLE IV presents a 
typical FLIGHT relation. 

| FLIGHT relation I 



| FROM 

1 


| TO 


| DISTANCE 


| DTIME 


| ATIME 


| COST 


| Burbank 


! 

| Reno 


1 

| 350 


j 

| 7:30 AM 


j 

| 8:40 AM 


| $ 250 


| Burbank 


| Reno 


| 350 


| 9:30 AM 


| 10:40 AM 


| 250 


j Burbank 


| Reno 


| 350 


| 11:30 PM 


| 12:30 AM 


1 150 


| Burbank 


j New York 


| 2500 


| 8:00 AM 


| 4:00 PM 


| 650 


j Reno 


| New York 


I 2400 


| 11:15 AM 


| 7:30 PM 


| 595 



TABLE IV 



A query of the form "FIND CHEAPEST FLIGHT BETWEEN [X] and [Y] M is an 
example of a request for information that cannot be obtained merely by retrieving data 
from this database but that can be derived from information in the database, for example 
by finding the fares of all flights between city X and city Y and then comparing the 
various fares to find which is lowest. The response to such a query would be the flight 
number of the cheapest flight between X and Y. 

EXAMPLE 4 

Consider a study of the spread of a sexually transmitted virus. One of the 
questions under investigation is the spread of the virus by heterosexual transmission. 
Records have been compiled of all heterosexual encounters in a defined population. 
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These records are arranged in an epidemic-study database in an ENCOUNTER relation in 
the form 

ENCOUNTER (MALE, FEMALE, DATE, CITY, COUNTY). 

Additional data respecting individuals in the population under study are arranged in a 
PERSON relation: 

PERSON (person, age, vaccinated) 
where vaccinated is a yes-or-no entry indicating whether the person has been vaccinated 
against the virus. Representative ENCOUNTER and PERSON relations are depicted in 
Tables V and VI, respectively. 



1 




ENCOUNTER relation (E) 




I el 


j e2 


j e3 | e4 




! MALE 
t 


j FEMALE 


| DATE | CITY 

t | 


j COUNTY 


i 

| Andrew 


j Susan 


r 1 

| 12-2-88 | Detroit 




| Andrew 


! Mary 


| 9-6-88 | New York 




j John 


j Susan 


| 11-19-88 | Boston 




j Richard 


| Susan 


| 2-14-89 [ Boston 




i Joe 


| Anne 


| 3-25-89 | New York 








TABLE V 








PERSON relation (P) 


j 




i pi 


1 P2 ! p3 






j PERSON 


| AGE j VACCINATED 






j Andrew 


| 25 | YES 






| John 


| 18 | NO 






| Richard 


| 23 | YES 






| Joe 


| 40 ( YES 






| Susan 


| 22 | NO 






{ Mary 


| 29 | YES 






| Anne 


| 17 j NO 


i 



TABLE VI 



A query that requests the names of all women who had encounters with a certain 
man is an example of a request for information that can be retrieved directly. A query 
that seeks the names of all vaccinated women who had encounters with a certain man is 
an example of a request for information that can be derived by (1) retrieving the names of 
all women who had encounters with that man according to the ENCOUNTER relation and 
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(2) checking the names of each of those women according to the PERSON relation to find 
out which of them have been vaccinated. 

Relational Algebra 

As the volume of data in a database grows larger and the nature of the relations 
grows more complex, evaluating complicated queries becomes more difficult and time- 
consuming. Simplifying the task of evaluating a given query is known as "optimizing" 
the query. There may be only a few, or many hundreds, of ways to evaluate a 
complicated query, all of which provide the same answer but some of which are much 
more efficient than others. "Optimizing" such a query may mean choosing the best of all 
possible ways of evaluating the query, but usually "optimizing" means finding a 
reasonable number of ways according to techniques that are known to increase 
computational efficiency and choosing the best of those. 

A set of relational operators collectively referred to as "relational algebra" has 
been developed for use in optimizing the evaluation of a complicated query in a large 
database. A query is translated into a relational algebra expression; the expression is 
simplified according to certain procedures; query plans for evaluating the simplified 
expression are generated; and the most efficient of these plans is selected and carried out 
to provide the desired response. See generally C. J. Date, An Introduction to Database 
Systems (4th Ed.) Vol. I, Addison-Wesley 1986, chapters 13 and 16, and references cited 
therein. 

The relational algebra includes a set of operators. These operators can be 
compared with arithmetic operators such as " + " and "-S-". Just as an arithmetic operator 
operates on one or two "input" numbers and provides a new "output" number, so each 
relational algebra operator takes one or two relations as "inputs" and provides a new 
relation as "output". A few examples of these operators are the SELECT, PROJECT, 
UNION, INTERSECTION , and JOIN operators. 
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The SELECT operator obtains specified rows from a relation. For example, a 
query seeking the names of Andrew's parents could be phrased as "SELECT entries 
respecting Andrew from the PARENT relation" (see Example 1 above). This SELECT 
operation would be represented in the relational algebra as 

In response, the SELECT operator would provide the following new relation: 

Andrew William 
Andrew Mary 

from the PARENT relation. 

The PROJECT operator obtains specified columns from a relation. For example, 
a query seeking the names of all persons who have children could be phrased as 
"PROJECT PNAME entries from the PARENT relation". This PROJECT operation would 
be represented in the relational algebra as 

r pname {PABEm) 
and would provide the following new relation: 

William 

Mary * 

John 

Anne 

Richard 

WDma 

from the PARENT relation. 

The UNION operator collects all the rows of each of two relations. The UNION 
operation is represented in relational algebra as 
A U B 

where A and B are relations. For example, PARENT U FRIEND in the historical 
database would provide a new relation containing parent-names and friend-names of 
everyone who has either a parent or a friend or both. In providing the new relation, the 
name entries in the PARENT relation would be correlated with the name entries in the 
FRIEND relation; for example, the name Andrew in the PARENT relation would be 
considered to refer to the same person as the name Andrew in the FRIEND relation. 
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The INTERSECTION operator collects common rows of each of two relations. 
The INTERSECTION operation is represented in relational algebra as 
A n B. 

For example, PARENT Pi FRIEND in the historical database would provide a new 
relation containing parent-names and friend-names of those persons who have both a 
parent and a friend. 

The JOIN operator combines rows from each of two relations according to a 
specified condition. In relational algebra the JOIN operation is represented as 
At * condition B 

For example, the join operation E **f ema i €=person P in the epidemic-study database of 
Example 4 above would provide a new relation by combining the information in each 
tuple of the ENCOUNTER relation with the information in that tuple of the PERSON 
relation having an entry under person that matches the entry under female in the tuple 
of the ENCOUNTER relation. 

Recursive Queries 

A kind of database query which has grown more important in recent years is a 
recursive query. Such a query can be described as a query which queries itself. A 
recursive query can be evaluated only by deriving information recursively. A general 
discussion of the mathematical concept of recursion can be found in J. Bradley, 
Introduction to Discrete Mathematics, ch. 6, Addison-Wesley 1988; see also E. Roberts, 
Thinking Recursively, John Wiley 1986. 

As a simple example of a recursive query, consider a query of the form "FIND 
GRANDPARENTS OF [X] M directed to the historical database of example 1 above. This 
database contains no information about grandparents. However, the requested 
information can be recursively derived from the information in the database, for example 
by a query of the form "FIND PARENTS OF [FIND PARENTS OF [X]]". 
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If the number of iterations required to evaluate a recursive query is known in 
advance, then the evaluation process is relatively straight-forward. For example, the 
request to find the grandparents of X requires exactly two iterations one to find the 
parents of X and one to find the parents of the parents of X. However, if the number of 
iterations is not known, then the evaluation becomes far more difficult; an example of a 
request in which the number of iterations is not known is a request to find all ancestors 
ofX. 

As the volume of data in a database grows larger and the nature of the relations 
expressed by the data grows more complex, the time required for even a very powerful 
computer to respond to a complicated recursive query can become unacceptably long, 
especially when the number of iterations required to derive the response is not known in 
advance. Accordingly, the efficient evaluation of recursive queries has become a matter 
of critical importance in the design of modern database systems. A comprehensive survey 
of this problem is presented by R Bancilhon and R. Ramakrishnan in "An Amateur's 
Introduction to Recursive Query Processing Strategies" in the Proceedings of (he ACM- 
SIGMOD Conference, Washington, D.C., May 1986. 

The relational algebra does not have recursion operators and hence cannot support 
recursive queries. Some relatively simple recursive queries can be expressed in transitive 
closure form, and transitive closure operators have been proposed for use in translating 
such queries into relational algebra expressions (R. Agrawal, "Alpha: An Extension of 
Relational Algebra to Express a Class of Recursive Queries", Proceedings of the Third 
International Conference on Data Engineering, Los Angeles, California, February 3-5, 
1987; S. Ceri et aL, "Translation and Optimization of Logic Queries: the Algebraic 
Approach", Proceedings of the Eleventh International Conference on Very Large Data 
Bases, Kyoto, Japan, August 1986). However, not all recursive queries can be expressed 
in transitive closure form. 
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From the foregoing, it will be apparent that there is a need for a way to optimize 
recursive queries, especially those which cannot be expressed in transitive closure form, 
for efficient evaluation in large and complex database systems. 

SUMMARY OF THE INVENTION 

The present invention provides a method of evaluating a recursive query in a 
database system by translating the query into an expression that includes a novel fixpoint 
operator and using a novel set of transformation procedures to simplify the translated 
query. 

Briefly and in general terms, a method of evaluating a recursive query includes the 
steps of translating the query into a relational algebra expression that includes a fixpoint 
operator, optimizing the expression according to a set of transformation procedures, and 
evaluating the optimized expression by reference to data in the database. The 
transformation procedures include commuting a projection operation with a fixpoint 
operation, commuting a selection operation with a fixpoint operation, distributing a join 
operation over a fixpoint operation, and regrouping a join operation and a fixpoint 
operation. 

Regrouping means applying the commutation and association rules, typically to an 
expression having a fixpoint and several join operators. The selection operation may be a 
selection predicate on a direct mapping column, a global selection predicate, or a 
selection predicate that includes a join operation. 

Other aspects and advantages of the present invention will become apparent from 
the following detailed description, taken in conjunction with the accompanying drawings, 
illustrating by way of example the principles of the invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a data flow diagram depicting a preferred embodiment of a method 
of optimizing recursive queries according to the invention; 

FIG. 2 is a flow diagram depicting initial and recursive inputs of a fixpoint 
operator as referenced in the "translate using ® " process of FIG, 1; and 

FIG. 3 is a flow diagram depicting a generalized version of the fixpoint operator 
shown in FIG. 2. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

As shown in the drawings for purposes of illustration, the invention is embodied in 
a novel method of evaluating recursive queries in a database system. Relational algebra 
provides a powerful technique for optimizing the evaluation of database queries, but 
recursive queries have not been amenable to such optimization techniques. 

In accordance with the invention, a novel fixpoint operator and new transformation 
procedures are provided for translating a recursive query into a relational algebra 
expression and then simplifying that expression. In the form of this simplified 
expression, the query can be evaluated much more efficiently than would otherwise be 
possible. 

As shown in state diagram form in FIGURE 1, a method of evaluating a recursive 
query of a database 11 comprises translating a recursive query into an expression that 
includes a fixpoint operator, as indicated by a "translate using ® " process circle 13; 
optimizing the expression according to a set of transformation procedures as indicated by 

an "optimize" process circle 15; and evaluating the optimized expression, as indicated by 

* 

an "evaluate" process circle 17, by reference to data in the database 11. 

The recursive query is received from a user as indicated by an input box 19 and 
an arrow extending from the box 19 to the "translate" circle 13. The "user" may be a 
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person at a computer terminal, but the user could also be, for example, an electronic 
device, an application program, or the like. Similarly, the result of evaluating the 
optimized query is provided to the user as indicated by an output box 21 and an arrow 
extending from the "evaluate" circle 17 to the box 21. The user who receives the output 
is usually the same user as the one who generated the query, but this need not be the 
case; the user that generates a query could specify that the result be sent somewhere else. 

Several novel transformation procedures are provided according to the invention. 
These new transformation procedures include commuting a projection operation with a 
fixpoint operation as indicated by a "commute projection" process circle 25, commuting a 
selection operation with a fixpoint operation as indicated by a "commute selection" 
process circle 27, distributing a join operation over a fixpoint operation as indicated by a 
"distribute join" process circle 29, and regrouping join and fixpoint operations as 
indicated by a "regroup" process circle 31. 

Arrows extend in both directions between the "optimize" process circle 15 and 
each of the process circles 25, 27, 29 and 31. These arrows indicate that application of 
one transformation procedure may result in an expression that requires application of 
another transformation procedure and that the optimization process may require one or 
more than one application of any given transformation procedure during the course of 
simplifying a given expression. Thus, depending on the characteristics of the expression 
being optimized, various ones of the transformation procedures may be used at various 
times during the optimization. Of course, some of the procedures may not be used at all 
in a given case. 

Commuting a selection operation with a fixpoint operation comprises commuting a 
selection predicate on a direct mapping column with a fixpoint operation as indicated by a 
"direct map" process circle 33, commuting a global selection predicate with a fixpoint 
operation as indicated by a "global" process circle 35, and commuting a selection 
predicate that includes a join with a fixpoint operation as indicated by a "join" process 
circle 37. Arrows extend in both directions between the "commute selection" circle 27 
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and each of the circles 33, 35 and 37 to indicate that one or more than one of the 
procedures respecting commuting a selection and a fixpoint operator may be used as 
needed; of course, none of the "commute selection" procedures may be required in some 
cases. 

Regrouping a join operation and a fixpoint operation comprises commutation and 
association as indicated by a "commutation* 1 process circle 39 and an "association" 
process circle 41, respectively. Arrows extending in both directions between the 
"regroup 11 process circle 31 and each of the circles 39 and 41 indicate that one or more of 
the commutation and association procedures may be used. Again, some cases might not 
require either procedure. 

In addition to the novel transformation procedures provided by the invention, a 
number of transformation procedures are already known in the relational algebra- 
Various ones of these may also be used to simplify a recursive query as indicated by a 
"transform procedures" process circle 43. Arrows extending in both directions between 
the circle 43 and the circle 15 indicate that these previously-known transformation 
procedures may be used once, several times or not at all in any given optimization. 

The fixpoint operator according to the invention enhances the declarative power of 
relational algebra by supporting recursive queries. It is expected that the introduction of 
the fixpoint operator will benefit many database and computer applications such as 
computer aided design and manufacture (CAD/CAM), software engineering (CASE), and 
artificial intelligence (AI) applications. 

As with other relational operators, the inputs (operands) and output of the fixpoint 
operator are relations. The fixpoint operator supports least fixed point semantics. The 
fixpoint operator can compute both linear and mutually recursive relations. 



SUBSTITUTE SHEE" 



WO 92/15066 



PCT/US92/01458 



15 

In a simple form the fixpoint operator is defined symbolically by an expression of 
the form 

(Wm) ®ce:Oe( r i-Rn,Rc) 

where 

I j represents the y-th initial input, 
CE represents a Condition Expression, 
OE represents an Output Expression, and 
Rj represents the /-th recursive input. 

This form of the fixpoint operator is depicted diagrammatically in FIG. 2. Each initial 
input I is a relation. There may be one or more such inputs; M initial inputs are shown 
in FIG. 2, Each recursive input R is also a relation. There may be none, one or more of 
these recursive inputs; N recursive inputs are shown in FIG. 2. A recursive input may, 
but need not, be the same as an initial input. The output of the fixpoint operator is fed 
back as a recursive input R c ; this recursive input R c differs from the N other recursive 
inputs in that the input R c is derived by the fixpoint operator whereas the other recursive 
inputs are not. 

More particularly, during a first iteration the initial inputs are utilized to provide a 
first output. This output is fed back as the recursive input R c . The recursive input R c 
and the N other recursive inputs are utilized to provide a second output, and so on for as 
many iterations as are required. 

If an output is not fed back as one of the recursive inputs, the fixpoint operator 
simplifies to conventional join and union operations. 

In general, equality predicates in the condition expression are treated as join 
columns during processing of the ® operator while any other predicates in the condition 
expression represent additional conditions which the tuples of the recursively derived 
relation generated by the ® operator must satisfy. 
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A more general form of the fixpoint operator, corresponding to a plurality of K 
mutually recursive relations, is defined symbolically by an expression of the form 

( Ii " !m) ®CE,:OE l CEjtOEj ( R i " ^ > Ra " R<x )• 

This form of the fixpoint operator is depicted diagrammatically in FIG. 3. Each initial 
input I is a relation. There may be one or more such inputs; M initial inputs are shown 
in FIG. 3. Each recursive input R is also a relation. There may be none, one or more of 
these recursive inputs; # recursive inputs are shown in FIG. 2. A recursive input may, 
but need not, be the same as an initial input. The fixpoint operator provides K recursive 
relations as outputs and these are fed back as recursive inputs R cl through R ^ ; these K 
recursive inputs R c differ from the N other recursive inputs in that the inputs R c are 
derived by the fixpoint operator whereas the other recursive inputs are not. In general, / 
may but need not be equal to K. 

The sets of initial or recursive inputs for each recursive relation need not be 
disjoint. For mutually recursive relations the recursive inputs cannot be disjoint because 
if they were the relations would not be mutually recursive. 

The ultimate output of the ® operator is a single relation representing the 
Cartesion product of the K mutually recursive output relations. Additional relational 
operators can be used to extract individual relations from the output. 

The invention will now be more formally described in mathematical terms. The 
following examples will show how the fixpoint operator can be freely intermixed with 
other relational algebra operators to pose powerful queries. 

The historical "people" database as described in Example 1 above includes the 
base relations PARENT, FRIEND and PERSON as given in Tables I through HI. A 
derived relation ANCESTOR, of the form 

ANCESTOR (DNAME. ANAME) 

where DNAME is the name of a person and aname is the name of an ancestor of that 
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person, is defined as 

PARENT ® mme=aname: ^ pmme {PARENT ANCESTOR). 
The fixpoint operator may then be used to express various recursive queries of the 
"people" database. Specifically, a query of the form "Find all ancestors of John H is 
expressed: 

^aname( G dname= "John* ( PARENT ®name=aname: dname, pname 
(PARENT,ANCESTOR ))) 

A query of the form "Find all people who are either friends of ancestors of John 
or friends of John" is expressed: 

^Jhamei^ dname = "John" « PARENT ® mme=(mam€: dname, pname 
(PARENT, ANCESTOR )) FRIEND )) U ^^ iumm w 

(FRIEND)) 



A query of the form M Find the friends of John's ancestors" is expressed 

T Jhame^dname= "John" « PARENT ®name=oname: dname, pname 
(PARENT ANCESTOR )) -^^^ FRIEND )) 

The factory database of Example 2 above includes the base relations SUPPLIER 
and SUBPART. A derived relation COMP, of the form 

COMP (CP ART, CSUBPART, CQTY), 

is defined as 

SUBPART Qspa^csMpart: s sub pan, cqty*sqty ( SUBPART.COMP ). 
The fixpoint operator may then be used to express various recursive queries such as a 
query of the form "Find the location and quantities of any parts that go into making a 
Locomotive** as follows: 

r csubpart,city\sum(cqty) (SUPPLY ^ =cyi ^ fln (GROUPJtY 
csubpart:csubpart t sum(cqty) (°cpart= "Locomotive"( SUBPART ® spart=csubpart; cpart, 
ssubpart, cqty*sqty (SUBPART.COMP ))))) 
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wherein the subscripts in the GROUPJBY operator indicate the group_by column and 
output columns of the GROUPBY operator. 

Similarly, in the airline reservation database of Example 3, a derived relation 
SHORT_CONNECTION of the form 

SHORT_CONNECTTON(START,D^ 
is defined as 

FLIGHT ® destmation=fiom t catime < dtime, distance < 1000: start, to, cdtime, atime, 
ccost, +cost ( ffi/ <^ SHORTSONNECTION > 

The fixpoint operator is used, for example, to express a query of the form "Find 
the minimum cost of all flights between London and San Francisco under the condition 
that the distance between each pair of connecting points is less than 1,000 miles" as 
follows: 

Min cost &start= "SF", destination^ "London" ^distance < 1000 ( ^GHT) 
® destination=from, catime < dtime, distance < 1000: start, to, cdtime, atime, 
ccost + cost FLIGHT* SHORT JOONNECTION ) 

A main task to be performed by a query optimizer is to rearrange the sequence of 
operations in an expression of a query for more efficient evaluation. Starting with an 
initial form generated by a parser, the query expression usually undergoes a sequence of 
transformations based upon certain heuristic rules or execution cost comparisons. The 
transformations usually include: 

Performing selections and projections as early as possible, 

Combining sequences of the same operation (e.g. selection or projection) into one 
operation, 

Commuting selection or projection operations with join or Cartesian product 
operations, 

Commuting join (or Cartesian product) operations, and 
Re-associating join (or Cartesian product) operations. 
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(See, generally, Ullman, J., Principles of Database Systems (2nd ed.), Computer Science 
Press, Maryland, 1982 for more discussion of transformations). 

These transformations are made possible by properties (such as commutativity and 
associativity) which are inherent in these operators. In order to avoid the development of 
a new paradigm for dealing with recursive queries, the present invention stays within the 
realm of relational algebra so that existing efficient implementation algorithms and other 
useful techniques (e.g. optimal plan search mechanisms) can be employed. However, not 
all of the common algebraic properties are held between fixpoint operators and other 
relational algebra operators. To integrate the fixpoint operator into an existing query 
optimization process, the valid transformation rules in the extended relational algebra 
have to be identified. 

Example 4 above (the study of the sexually transmitted virus) will be used to 
illustrate these rules. 

Assume that one of the questions under investigation is the spread of the virus by 
heterosexual transmission. To this end, a record has been kept of all heterosexual 
encounters in a population under study. The ENCNTR relation is used to store these 
records. The relation contains additional information about each person in the population 
under study. Now consider the two derived relations EXPSDFML (MALE, FEMALE, DATE, 
CITY, COUNTY) and EXPSDML (FEMALE, MALE, DATE, city, COUNTY). The relation 
EXPSDFML is defined as: 

T *«[ *el, e2 t e3, e4, e5 E M r e2, e3 t e4 t e5 E ]> 

®lf2=mhJ3<m3 t m2=eh m3<e3:fl t e2, e3. e4, e5\ y [m2=fl, m3<J3, J2=e2,j3<e3 

ml. el eJ, e4. e5] « E > F > M >> ( W »> 
The projection is needed since the output of the fixpoint operator is, by definition, the 
cross product of relations F and M. There is no need to compute the cross product of F 
and M in such a case, and the query processor will detect that. Similarly, the relation 
EXPSDML is defined as: 

T eh e2, e3. e4. e5 E ^ I T e2, eh e3, e4, e5 E 3) 
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®[f2=ml,f3<m3, m2=el, m3<e3 :fl t e2, e3, e4. e5\, [m2=fl. m3<f3,f2=e2,f3<e3 
: ml,el,e3,e4,e5]((E>M)AE,F,M))) 

Tables V and VI above illustrate the ENCOUNTER (E) and PERSON(P) relations. 
The EXPSDFML(F) AND EXPSDML(M) relations are illustrated in the following tables 
VII and Vm. 



i n 

! MALE 



EXPSDFML relation (F) 
| fZ | 0 | f4 | 5 

| FEMALE | DATE | CITY j COUNTY 



TABLE VH 



| EXPSDML relation (M) 

j ml | m2 | m3 j m4 j m5 

| FEMALE j MALE | DATE | CITY j COUNTY 

I— i— !- I— I— 

TABLE Vm 



Now, consider query Ql which finds all vaccinated females who might have been 
exposed to the virus either directly or indirectly through make carrier X and such that all 
encounters leading to each female have taken place in New York. The query is expresses 
in the extended relational algebra as follows: 

*j2 ( 0/7= *x% G:f4= "NY", G: m4= "NY", p3=true ( P *V=/2 (ff T el t e2 t e3, e4, e5 
[ T e2, el, e3, e4 $ e5 E ^ ®\J2=ml,f3<m3, m2=el, m3<e3 :fl, e2, e3, e4, e5\ t 
[Tri2=fl,m3<f3,f2=e2,j3<e3:mL el t e3. e4 t e 5](( E > F > M )> ( E > F > M ))))) 

Note that a "G" tag attached to a selection condition indicates that it is a global 
one. A selection predicate is said to be global if it is applied at each iteration during the 
generation of the recursive relation defined by the fixpoint operation. In other words, 
once a tuple of an input relation fails to satisfy a global selection predicate, it will be 
excluded from consideration in any of the subsequent recursive computations. (The 
detection of global predicates will be discussed later.) 
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In general, for queries on recursive relations, a detailed semantic analysis is 
required to determine the legal transformations that can be applied to an algebraic 
expression. The present discussion is limited to transformation rules which do not require 
any semantic query analysis. Formal proofs of the rules are omitted because of their 
length and instead each rule is motivated either with a sketch of a proof or with detailed 
examples. 

1. Commuting projection operation with fixpoint operation 

Rule 1: Let IINP t and RINPf be the initial and recursive input expressions for 
mutually recursive relation R { of the fixpoint operation. PE- t denotes the set of attributes 
of /?, in the projection operation. SE t denotes the set of attributes of R f in the selection 
operation. CE, and OE l denote the sets of attributes in the condition expression and 
output expression of R t in the fixpoint operation. AfllNPj) denotes the set of attributes in 
the initial input expression and AfRINPj) the set of attributes in the recursive input 
expression for relation Then 

T ™1 PEn ia SEj SEn^NPj , .... IINPJ * C£j . ^ ^ . ^ 

(R1NP 1 BWiffl m *PEj PE„ (°SEj SE n «*PE>j ™»J 

llNP n ) <g> CE] . 0Elj ^ . 0£ , n (VpEWj umpj, . ( rpEmnRIN p n))) 

where 

PE\ = (PE i U SE; U (Uj? =i CE ; .)) n A ( IINP^, 
PE] = {PEi U SE t U (U] = jCEj ))DA( RINP t ), and 
OE\ = (PE { U SE t U {UJ^jCEj)) n 0£. 

Sketch of proof: In principle, values for columns not required in any subsequent 
operations can be discarded. Therefore, during the process of inputs for the fixpoint 
operation, only the values for columns being referenced in the condition or output 
expressions of the fixpoint operator or in other subsequent operations need to be 
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retrieved. For example, using query Ql, this rule can be applied to transform it as 
follows: 

*fl( <T fl= T, G:f4= "NY", G: m4= "NY", p3=true ( P>< pl=J2 (& T A el, e3, e4, eS®' 

t T *2, el, e3, e4, e5 E ^ ®[j2=ml,f3<m3, m2=el, m3<e3 :fl, el, e3 t e4, e5], 
I nQ=fl, m3<J3, J2=el, J3<e3 ; ml, el, e3, e4, e5 ] (( E > F > M X ( E,F,M ))))) = 

**fi ( a fl= m X n , G:f4= "NT, G: m4= "NY", p3=rrue ( P>< pl=J2 (& T el, el, e3, e4 E 3* 

l T el, el, e3, e4 E ^ ®[J2=ml,j3<m3, travel, m3<e3 :fl, el, e3, e4], [ m2=fl, 
m3<J3,JZ=el,J3<e3:ml, el, e3, e4](( T el, el, e3, e4 E > T fl,j2,f3,f4 F > T ml, m2, 
m3, m4 M )> ( T eL A e3, e4 E > *fl,fl,j3,f4 F > *ml t ml, m3, nrf^ »>» 

2. Commuting selection operations with fixpoint operations. 

This discussion will explore the heuristic of "performing selection as early as 
possible 1 *. In general, it means to move the selections inside other operators as far as 
possible. For query Ql, all of its selections can be applied to base relations E and P 
directly rather than the final result composed by the join operation and fixpoint operation. 
The original expression is then translated into: 

r fl^ a p3^tme^pl^fl (& a el="X m , e4= "NY" ( r el , A e3, e4, eS^ 
[^="Wy"< T e2, el, e3, e4, eS^ ®\fl=ml. J3<m3, ml=el, m3<e3 :fl, A e3, e4, 
e5], [m2=fl, m3<J3,fi=el,J3<e3 : ml, el, e3, e4, e5\ 
(((*e4="NY" E )> F > M )> ((*e4=-NY"7 E> >> F > M W 

In the above expression, the global selections (G:f4= w NY l, ,G:m4= B NY t ') are 
converted to regular selections and applied to both initial and recursive inputs of the 
fixpoint operation. The purpose of this translation is to reduce the sizes of the operands 
for each operation. If the fixpoint operator ® is viewed as a generator of a directed 
graph consisting of all paths leading to all possible answers, an early selection on the 
initial input and recursive input relations has the effect of eliminating the unqualified 
paths in the graph before they are generated. See, generally, Ioannidis, Y., and Wong, 
W., "On the Computation of the Transitive Closure of Relational Operators", Proc. of 
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12th Int. Conf on VLDB, Tokyo, Japan, August, 1986; Jagadish, H., Agrawal, R,, and 
Ness, L M "A Study of Transitive Closure as a Recursive Mechanism", Proc. of ACM- 
SIGMOD 1987 Int. Cortf. on Management of Data, San Francisco, California, May 1987; 
and Lu, H., "New Strategies for Computing the Transitive Closure of a Database 
Relation", Proc. of 13th. Int. Conf on VLDB, Brighton, England, September, 1987. 

While the legality of moving global selections into the fixpoint operator will be 
apparent, the movement of the non-global selection » x » ) into the initial input needs 
some explanation. After a careful examination, it can be found that the two forms of the 
query are semantically equivalent. Hie is because the selection (o^ = » x „) is applied to a 
direct mapping column of the recursive relation EXPSDFML. A column of a recursive 
relation is considered a direct mapping column if the output expression for that column 
consists of that column itself only. The crucial characteristic of a direct mapping column 
is that is acquires its entire set of values from the initial input relation(s). The values it 
assumes during each subsequent recursive iteration are always taken from the value set of 
its own initial input and are not computed from values of other resources. Thus, once the 
set of values of the initial input is determined, no new values are added to the column. A 
careful look at the definition of EXPSDFML will show that new values are added to 
columns f 2 , ...,f 5 during each recursive iteration but not to column f . 

The concept of direct mapping column is very similar to that of invariant column 
introduced in Devanbu, P. and Agrawal, R., "Moving Selections Into Fixpoint Queries", 
Proc. of 4th Int. Conf. on Data Engineering, Los Angeles, February, 1988. Direct 
mapping columns are actually a subset of invariant columns. However, the detection of 
the more general invariant columns requires a detailed analysis of the selection predicates. 
The advantage of concentrating on direct mapping columns is that their detection is trivial 
and they cover the majority of the cases for which the payoff for performing early 
selections is substantial. 

Selections which are neither global nor applied to direct mapping columns will 
now be considered. Let Q2 be the query to find all females who might have been 
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exposed to the virus either directly or indirectly through male carrier X and such that the 
last encounter took place in New York. The query is expressed in the extended relational 
algebra as follows: 

T fi( a fl= "X", f4= "NT* & r el, e2, e3, e4, e5 t T e2, eh e3, e4, e5^ 
®\f2=ml,f3<m3, m2=el, m3<e3 :fl. e2, e3 t e4, c5], [m2=fl, m3<f3,f2=e2, 
J3<e3 ; ml, el, e3, e4, e5 ] (E.F t A£)))) 

Distributing the selection (<^_ »#y« 5 ) over the fixpoint operator would result in: 

X J2 (°/7= *X" <&(*e4= "NY" ( x eJ, e2, e3, e4, e5 E ^e4= "NY" 

( T *2, el, e3, e4, eS^ ®[f2=ml, J3<m3, m2=eh m3<e3 :fl, e2, e3. e4 $ <?5], 

[m2=fl, m3<J3,J2=e2,j3<e3 : ml. el, e3, e4, e5] <&° e 4="NY* E WW* 

However, the above query expression will not generate the complete set of 
answers. This is due to the exclusion of qualified "bridge tuples" from the intermediate 
results used for computing the EXPSDFML and EXPSDML relations. According to the 
above expressions, only those tuples satisfying -^yE ) participate in the initial and 
recursive inputs to the relation EXPSDFML. In order to produce the complete set of 
answers, all qualified "bridge tuples* need to be included. It means that all tuples which 
do not satisfy (<r^_ E ) must still be saved for subsequent computation to avoid the 
loss of certain answers. 

The rules governing the movement of selections into fixpoint operators thus 
depend on whether the selections are global, and if not global whether they apply to direct 
mapping columns. They are formulated as follows: 

A. Commuting selection predicates on direct mapping columns with fixpoint o perations. 
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Rule 2: Let SED^ be the set of selection predicates on direct mapping columns for 
mutually recursive relation R- 9 where SED i contains no reference to any column of 
mutually recursive relation Rj where / ^ /. Then 

°SED! , .... SED n (VMPl IlNP n ) ® ^ . 0E] ^ . 0En (RINPj , 

- MNP n )) = (cseojIINPj , .... asa^miPt ) • ttj . 0E] C£/j . QEn 

(RINP ]p ...,RINP n )) 

For example, as seen earlier, this rule was used to move the selection predicate 
(ofi- » x » ) into input relation E for query QL 

B. Commuting global selection predicates with fixpoint operations 



Rule 3: Let SEG i be the set of all global predicates in the selection expression 
referencing attributes of mutually recursive relation R t where SEG i contains no reference 
to any column of mutually recursive relation Rj where / 5* j; and let SEGi be the same 
set of predicates after the "G M tag has been removed. Then 

G SEGi SEG n (( IINP l> -> "N? n ) ® . 0E] C£n . QEn (RINPj 

WW - ('SEG^l > - 'SBGn-BHPn) ® CEj : OEj CE n , OE n 

(<r SEGl > RINPj , <* SEGn *RlNP n )) 

For example, as seen earlier, this rule was used to move the global selection predicates 

(f 4 = "NY 0 ) and (m 4 = "NY") in query Ql into the initial and recursive inputs of the 

fixpoint operator. 

The designation of global predicates is left to the user (see, generally, Rosenthal, 
A., Heiler, S., Dayal, U., and Manola, F M "Traversal Recursion: A Practical Approach 
to Support Recursive Applications", Proc. ofACM-SIGMOD 19861m. Conf. on 
Management of Data, Washington, D.C, May 1986). As was exposed earlier, the 
detection of direct mapping columns is straightforward. The detection of such columns 
may be performed when a recursive relation is defined and that information permanently 
stored in the system catalogs. 
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Thus far, selections wherein all the columns involved belong to a single mutually 
recursive relation have been considered. Selections of the form f t 0fj where ^ is a 
column of mutually recursive relation R t , fj is a column of mutually recursive relation Rj 
where / ^ j, and 6 is a comparison operator, will now be discussed. 

Consider query Q3 which finds all males who might have been indirectly exposed 
to the virus through make carrier X. The query is expressed as follows: 

T m2 ( ff /7= "X", J2=ml <& T eh e2, e3, e4, eS E 3> * T *2, eh e3, e4, e5 E l 
® [f2=ml,J3<m3, m2=el, m3<e3:fl, e2, e3 f e4, e5] t [m2=fl, m3<f3,J2=e2, 
J3<e3 : ml, eh e3, e4, e5] (MM), (E,F,M)))) 

The selection f2 = ml is nothing but a join of the two recursive relations 
EXPSDFML and EXPSDML. This will be detected by the query optimizer and query Q3 
will be translated to: 

T m2 "X" ( r F^ T eh e2, e3, e4, e5 E J' I T e2, el. e3, e4, e5 E D 
®\f2=mhj3<m3, m2=el, m3<e3 :fl t e2, e3, e4, e5] 9 [m2=/Z, m3<f3,j2=e2, 
J3<e3: mh el w e3, e4, e5\ ((E,F f M)>{E,F,M))) »<j2= m 2 * M (& T ehe2. e3. e4, e5^ 
t T e2, el, e3, e4, e5 E ^ ®\f2=mhj3<m3 t m2=eh m3<e3 :fl, e2, e3, e4, e5 ], 
[m2=fl, m3<J3,J2=e2,J3<e3 : ml, eh e3, e4, e5] i( E > F > M ), (E,F,M ))))) 

At execution time, the fixpoint operation will be performed only once, generating 
both recursive relations EXPSDFML and EXPSDML simultaneously. A join will then be 
performed on both relations. That is, no cross products or projections of relations 
EXPSDFML and EXPSDML will actually take place to evaluate query Q3. This rule is 
formulated as: 

C. Detection of join operations 

Rule 4: Let^ 8 Jj be a selection where f t is a column of mutually recursive 
relation R; , fj is a column of mutually recursive relation Rj where / ^ /, and 9 is a 
comparison operator. Then 
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<*fi 6 j-MInpj], flnp n ]) ® { condi ': Out]) [Cond n : Out n ] (Recinpj 

Recinp n )) = r Ri (([lnp]] 9 [Inp n ]) ® { Condi : Out]] [Cond n ; Out n \ 

{Recinpj , Recinp n )) x T Rj (([Inpj], [Inp n ]) ® ,0^7 Out] ] 

[Goirfo : Out n j {Recinpj , /tecw/? rt )) 

3. Distributing join operations over fixpoint operations 

In traditional quay optimization, selections ate moved inside joins to reduce the 
size of the operands of the join operations. This is usually a good strategy because 
selections always reduce the sizes of their operands. On the other hand, the result of a 
join operation may be smaller or larger than the sizes of its operands. But the result of 
fixpoint operation is always larger than the size of its initial inputs. Therefore, a query 
optimizer should never consider moving a fixpoint operator inside a join operator, but 
should assess the value of moving a join operator inside a fixpoint operator. 

Consider again query Ql. One can safely move the join operator (without the 
selection) inside the fixpoint operator. The query translates to: 

r J2 <°/7= "X\ G: f4= "NY", G: m4= "NY", p3=true «£ P *V=e2 E 3» t £ D 
®\f2=ml t f3<m3, m2=el, m3<e3 :fl> e2, e3, e4, e5,p2,p3\ % [m2=fl, m3<J3, 
J2=e2,J3<e3 : ml, el, e3, e4, e5\ # p>< pl=e2 E * F > M )> (E f F f M)))) 

However, the distribution of join operators over fixpoint operators is not always as 
trivial. As an example, consider a slightly modified schema where the relation PERSON 
is replaced by the relation VACCINATED-PERSON(person, age) which contains the age 
of each vaccinated person in the population under study. The relation is depicted in the 
Table IX as: 

j VACCINATED PERSON relation (V) | 
I vl | v2 | 

j PERSON | AGE j 

I I I 

TABLE DC 
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Now, let Q4 be the query to find all vaccinated females and their age if they have 
been exposed to the virus either directly or indirectly through male carrier X. The query 
is expressed as follows: 

T J2,v2( (T fl= m X m ( Vt "vl=fl MPl* I £ 3> ®\f2=ml,f3<m3, m2=el, m3<e3 :fl, el, 

e3, e4. e5\, [m2=fl, m3<fi,J2=e2,J3<e3 : ml, el, e3 t e4, e5] 

((E,F,M ), (E,F,M ))))) 

Attempting to distribute the join operator results in: 

r j2,v2( (T fl=''X m (& v>< vl=e2 E ]> ®\f2=ml t f3<m3, m2=el, m3<e3:fl, e2, e3. 
e4, e5, v21, [m2=fl, m3<J3,J2=e2,j3<e3 : ml, el, e3, e4,e5\ 
(( v "v]=e2 E > F > M )> (S^AI)))) 

However, this transformation will not produce the correct result. This is again 
due to the loss of qualified "bridge tuples". Although non-vaccinated females are not 
requested in query Q4, their presence during the iterative process is essential to finding 
all vaccinated females who have been exposed indirecdy to the virus carried by "X". In 
order to preserve the correct answer, the join operator has to transform to a right outer 
join operator (see, generally, Date, C, Relational Databases; Selected Writings, 
Addi son-Wesley Publishing Company, 1986) as it moves inside the fixpoint operator. 
During query execution, all tuples which are strictly the result of the outer join may be 
marked so they are eliminated from the final result. 

If the join is over a direct mapping column, the join operator need not be 
converted to a right outer join operator. Consider the modified schema again and let Q5 
be the query to find all vaccinated males, their ages, and all females who have been 
exposed to them directly or indirectly. The query is expressed as follows: 

r fl, v2,J2 ( V **vl=fl W£L Kl) ®\f2=ml,j3<m3, m2=el, m3<e3 ;fl, el, e3, e4 t 
rf], [m2=j7, m3<J3,J2=e2.J3<e3 ; ml, el, e3, e4, e5] d E > F > M )> {E,F,M)))) 
The distribution of the join operator results in: 
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T fl, v2 t J2 ((l Vt *vl=el E *l> t £ D ®\ft=ml,f3<m3, m2=el, m3<e3:fl, e2, e3, e4, 

e5, v2], [m2=fl, m3<J3,J2=e2,fl<e3 : ml, el, e3> e4 t e5\ 

{(V" v l= el E>F,M)AE,FM))) 
which will produce the correct answers because the join was over a direct mapping 
column. 

For simplicity, the distribution of joins is performed over the initial inputs and 
non-recursive relations in the recursive inputs and only when the join columns are 
restricted to a single non-recursive relation in the inputs. 

Rule 5: Let 11NP 1 and RINP^ be the initial and recursive input expressions for 
mutually recursive relation R { of a fixpoint operation. Let REGEXP be a regular 
relational expression whose output is, as usual, a single relation. Consider the 
expression: 

REGEXP x x$fy ((lINPj IINP N ) ® CE] :0EJ CEn :En (RINPj 

RINP n )) 

where x is one of the columns of REGEXP, fj is one of the columns of mutually recursive 
relation Rj, and 0 is a comparison operator. Consider the recursive input expression 
RINPj for relation Rj. It consists of the recursive relations Rj , R n , and some non- 
recursive relations NRj , NR m . Similarly, the initial input expression IINPj for 
relation Rj consists of relations /j,...,*/* If column^ takes all its initial inputs from a 
single column of a single input relation l k and all its recursive inputs from a single 
column of a non-recursive relation NR V , then the above expression is equivalent to: 

{UNPj, IINPj_j , (Ij REGEXP* xefjt l k , /,), IINPj+j , ...,lINP n ) 

®CE] • OE] CE n : OE n <& NP 1 — ■ NNP j-l * m l ■ — REGEXP<» X e ff 

NR y ,...,NR m , Rj R n ) 9 RINPj+j , ...,RINP n ) 

where <*> indicates a right outer join or regular join operator, depending upon whether the 

join column fj is a direct mapping column or not, and where fj ' is the appropriate column 
in relation I k and/- " is the appropriate column in relation NR V . 
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4. Commuting and associating join operations 

Rule 6: The commutative and associative algebraic laws for join operations still 
hold if each fixpoint operation is treated as a whole. That is, 

A m (B ® (C,D ))xE = AxEx(B® (C,D )), and 

(AxB)x(C® (D f E )) b ^ m (fl x (C ® (D,£ ))) 
These commutative and associative properties allow the query optimizer to choose the 
best execution operations. 

The algebraic properties of fixpoint operators developed above give the query 
optimizer more leeway in choosing efficient execution strategies. The transformation 
rules representing these properties can be added in a fairly straightforward manner to 
most existing query optimizers. 

Most implementations of logic databases (see, for example, Morris, K., Ulman, 
J., and Gelder, A., "Design Overview of the NAIL System", Proc. 3rd Int. Coif, on 
Logic Programming, 1986, and Zaniolo, C, and Sacca, D., "Rule Rewriting Methods for 
Efficient Implementation of Horn Logic", MCC Technical Report DB-084-87, March 
1987) do not rely on any statistical information to determine their execution strategy. 
They commonly use simple heuristics which choose to extend the predicate with the 
largest number of bound arguments. One of the most promising features of the present 
approach is that it is targeted at existing relational query optimizers. Thus, the 
transformation rules presented herein become a tool for the query optimizer to choose 
among a menu of execution strategies based upon the estimated execution costs associated 
with each form a query can take. 

From the foregoing it will be appreciated that the invention provides an effective 
and efficient method of evaluating linear and recursive queries in large databases. 
Existing relational techniques are integrated with the novel fixpoint operator and 
transformation procedures provided by the invention to optimize even very complex 
recursive queries. 
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Although a specific embodiment of the invention has been described and 
illustrated, the invention is not to be limited to the specific forms or arrangements of parts 
so described and illustrated, and various modifications and changes can be made without 
departing from the scope and spirit of the invention. Within the scope of the appended 
claims, therefore, the invention may be practiced otherwise than as specifically described 
and illustrated. 
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CLAIMS 

L A method of evaluating a recursive query (19) of a computerized database 
(11), the method comprising: translating a recursive query into an expression that includes 
a fixpoint operator (13); optimizing the expression according to a set of transformation 
procedures (15); and evaluating the optimized expression (17) by reference to data in the 
database (11). 

2. A method according to claim 1 wherein one of the procedures comprises 
commuting (25) a projection operation with a fixpoint operation. 

3. A method according to claim 1 wherein one of the procedures comprises 

i 

commuting (27) a selection operation with a fixpoint operation. 

4. A method according to claim 3 wherein commuting a selection operation with 
a fixpoint operation comprises commuting a selection predicate on a direct mapping 
column (33) with a fixpoint operation. 

5. A method according to claim 3 wherein commuting a selection operation with 
a fixpoint operation comprises commuting a global selection predicate (35) with a fixpoint 
operation. 

6. A method according to claim 3 wherein commuting a selection operation with 
a fixpoint operation comprises commuting a selection predicate including a join (37) with 
a fixpoint operation. 

7. A method according to claim 1 wherein one of the procedures comprises 
distributing a join operation (29) over a fixpoint operation. 

8. A method according to claim 1 wherein one of the procedures comprises 
regrouping a join operation (31) and a fixpoint operation. 
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9. A method according to claim 8 wherein regrouping comprises 
commutation (39). 

10. A method according to claim 8 wherein regrouping comprises 
association (41). 

11. A method of evaluating a recursive query of a computerized database (11), 
the method comprising: in a computer, translating (13) a recursive query provided by a 
user into an expression that includes a fixpoint operator; automatically optimizing (15) the 
expression in the computer according to a set of transformation procedures (43) stored in 
the computer; and automatically evaluating (17) the optimized expression by reference to 
data in the computer database. 
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